7 research outputs found

    CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In Remote Sensing

    Get PDF
    We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. Notice that these multi-modal retrieval scenarios are more challenging than the traditional uni-modal retrieval approaches given the inherent differences in distributions between the modalities. However, with the growing availability of multi-source remote sensing data and the scarcity of enough semantic annotations, the task of multi-modal retrieval has recently become extremely important. In this regard, we propose a novel deep neural network based architecture which is considered to learn a discriminative shared feature space for all the input modalities, suitable for semantically coherent information retrieval. Extensive experiments are carried out on the benchmark large-scale PAN - multi-spectral DSRSID dataset and the multi-label UC-Merced dataset. Together with the Merced dataset, we generate a corpus of speech signals corresponding to the labels. Superior performance with respect to the current state-of-the-art is observed in all the cases

    Zero-Shot Sketch Based Image Retrieval using Graph Transformer

    Full text link
    The performance of a zero-shot sketch-based image retrieval (ZS-SBIR) task is primarily affected by two challenges. The substantial domain gap between image and sketch features needs to be bridged, while at the same time the side information has to be chosen tactfully. Existing literature has shown that varying the semantic side information greatly affects the performance of ZS-SBIR. To this end, we propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks which uses a novel graph transformer to preserve the topology of the classes in the semantic space and propagates the context-graph of the classes within the embedding features of the visual space. To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space. We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set. Experimental results obtained on the extended Sketchy, TU-Berlin, and QuickDraw datasets exhibit sharp improvements over the existing state-of-the-art methods in both ZS-SBIR and generalized ZS-SBIR.Comment: Accepted at ICPR 202

    CrossATNet - a novel cross-attention based framework for sketch-based image retrieval

    Get PDF
    We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modeling based SBIR approaches majorly focus on bridging the gaps between the seen and unseen classes by generating pseudo-unseen-class samples. Besides, violating the ZSL protocol by not utilizing any unseen-class information during training, such techniques do not pay explicit attention to modeling the discriminative nature of the shared space. Also, we note that learning a unified feature space for both the multi-view visual data is a tedious task considering the significant domain difference between sketches and the color images. In this respect, as a remedy, we introduce a novel framework for zero-shot SBIR. While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain exploiting information from the respective sketch counterpart. In order to preserve the semantic consistency of the shared space, we consider a graph CNN based module which propagates the semantic class topology to the shared space. To ensure an improved response time during inference, we further explore the possibility of representing the shared space in terms of hash-codes. Experimental results obtained on the benchmark TU-Berlin and the Sketchy datasets confirm the superiority of CrossATNet in yielding the state-of-the-art results

    A Simplified Framework for Zero-shot Cross-Modal Sketch Data Retrieval

    No full text
    We deal with the problem of zero-shot cross-modal imageretrieval involving color and sketch images through a noveldeep representation learning technique. The problem of asketch to image retrieval and vice-versa is of practical im-portance, and a trained model in this respect is expectedto generalize beyond the training classes, e.g., the zero-shot learning scenario. Nonetheless, considering the dras-tic distributions-gap between both the modalities, a fea-ture alignment is necessary to learn a shared feature spacewhere retrieval can efficiently be carried out. Additionally,it should also be guaranteed that the shared space is se-mantically meaningful to aid in the zero-shot retrieval task.The very few existing techniques for zero-shot sketch-RGBimage retrieval extend the deep generative models for learn-ing the embedding space; however, training a typical GANlike model for multi-modal image data may be non-trivialat times. To this end, we propose a multi-stream encoder-decoder model that simultaneously ensures improved map-ping between the RGB and sketch image spaces and highdiscrimination in the shared semantics-driven encoded fea-ture space. Further, it is guaranteed that the class topologyof the original semantic space is preserved in the encodedfeature space, which subsequently reduces the model biastowards the training classes. Experimental results obtainedon the benchmark Sketchy and TU-Berlin datasets estab-lish the efficacy of our model as we outperform the existingstate-of-the-art techniques by a considerable margin

    Synergistic Use of TanDEM-X and Landsat-8 Data for Crop-Type Classification and Monitoring

    Get PDF
    Classification of crop types using Earth Observation (EO) data is a challenging task. The challenge increases many folds when we have diverse crops within a resolution cell. In this regard, optical and Synthetic Aperture Radar (SAR) data provide complementary information to characterize a target. Therefore, we propose to leverage the synergy between multispectral and Synthetic Aperture Radar (SAR) data for crop classification. We aim to use the newly developed model-free three-component scattering power components to quantify changes in scattering mechanisms at different phenological stages. By incorporating interferometric coherence information, we consider the morphological characteristics of the crops that are not available with only polarimetric information. We also utilize the reflectance values from Landsat-8 spectral bands as complementary biochemical information of crops. The classification accuracy is enhanced by using these two pieces of information combined using a neural network-based architecture with an attention mechanism. We utilize the time series dual co-polarimetric (i.e., HH–VV) TanDEM-X SAR data and the multispectral Landsat-8 data acquired over an agricultural area in Seville, Spain. The use of the proposed attention mechanism for fusing SAR and optical data shows a significant improvement in classification accuracy by 6.0% to 9.0% as compared to the sole use of either the optical or SAR data. Besides, we also demonstrate that the utilization of single-pass interferometric coherence maps in the fusion framework enhances the overall classification accuracy by ≈ 3.0%. Therefore, the proposed synergistic approach will facilitate accurate and robust crop mapping with high-resolution EO data at larger scales.This work was supported in part by the German Aerospace Center (DLR) which provided all the TanDEM-X data under project POLI6736, in part by the State Research Agency (AEI), in part by the Spanish Ministry of Science and Innovation, and in part by the EU EFDR funds under Project TEC2017-85244-C2-1-P. The work of N. Bhogapurapu and S. Dey was supported by the Ministry of Education (formerly Ministry of Human Resource and Development-MHRD), Government of India
    corecore